ECE 312 Exam 1 Spring 2000

ECE 511 Exam 2 Fall 2005

Tuesday, December 13, 2005

· You are allowed to use any notes, books, papers, web sites, or other reference material as you desire. No interactions with others are allowed.

· This exam is designed to take 120 minutes to complete. To allow for any unforeseen difficulties, you are also allowed a 120-minute automatic extension. You are allowed to work on the exam for a continuous period of up to 240 minutes (four hours).

· This exam is based on lectures as well as class reading material. Each true/false question is concerned with one topic we covered in the course.

· The questions are randomly selected from the topics we covered this semester.

· You can write down the reasoning behind your choice for up to five questions for possible partial credit. Choose wisely.

· Please use the provided plain text template to submit your answers to w-hwu@uiuc.edu by noon, Wednesday, December 14, 2005

· Good luck!

Part 1 (10 points): This part tests your understanding of predicated execution. Label each of the following statements as T (true) or F (false) according to the lectures and class reading.

A. (2 pts) According to Mahlke, et al., “A comparison between Full and Partial Predication Support for ILP processors,” the code sequence of a load instruction to be predicated with partial predication support using conditional move instructions require control speculative load instructions. This is because of the need to ignore exceptions caused by the load instruction under execution conditions where the load instruction should not be executed.

B. (2 pts) According to Mahlke, et al., “A comparison between Full and Partial Predication Support for ILP processors,” a basic block with store instructions cannot be predicated with conditional move instructions. This is because there is not a good way to nullify the effect of the store instruction under the execution conditions where the store instruction should not be executed.

C. (2 pts) According to August, et al., “Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture,” predicate promotion allows the removal of predicates from predicated instructions, thus removing control dependences and reducing schedule length. This, however, requires control speculation support since the instructions whose predicates are removed execute more frequently as a result and are thus in effect speculative.

D. (2 pts) According to August, et al., “Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture,” the IMPACT EPIC Architecture supports an inline selective recovery model that eliminates the need to generate recovery blocks for control speculation.

E. (2 pts) According to Rau and Fisher, “Instruction-Level Parallel Processing – History, Overview, Perspective,” Lam attempts to achieve a better II than predication by scheduling each leg of the control construct separately. This achieves a smaller MII than predicated execution. However, this was later shown by Warter to cause larger II’s when there is complex resource usage.

Part 2 (10 points): This question tests your understanding of vector processing

Label each of the following statements as T (true) or F (false) according to the lectures and class reading.

A. (2 pts) Vectorization of a loop involves loop distribution that takes a loop with multiple statements in the loop body and generate multiple loops that each contain only one statement in the loop body.

B. (2 pts) A loop cannot be vectorized if there is a backward loop carried dependence from a statement in the loop body to another one that appear earlier in the loop body.

C. (2 pts) According to Russell, “CRAY-1 Computer System,” vector merge and test instructions are provided in the CRAY-1 to allow operations to be performed on individual vector register elements designated by the content of the vector mask register.

D. (2 pts) Vector chaining is a technique that takes advantage of parallel operation of function units by allowing a vector instruction to receive streamed results from another vector instruction as these results emerge from a function unit.

E. (2 pts) According to Russell, “The CRAY-1 Computer System,” the CRAY-1 Supercomputer has a vector chaining window of one clock cycle.

Part 3 (10 points): This part tests your understanding of parallel processing fundamentals. Label each of the following statements as T (true) or F (false) according to the lectures and class reading.

A. (2 pts) In the data parallel model, the same computation is applied to parts of a large data structure.

B. (2 pts) In a shared memory system, the same physical address refers to the same data for all processor elements in the system.

C. (2 pts) In a distributed memory (message passing) system, the physical address space of each processor does not need to have to accommodate the data of the program being executed as long as they can accommodate the portion of the data assigned to the processor.

D. (2 pts) According to Amdahl’s law, a parallel processing system that executes a computation with 90% parallel and 10% serial components can achieve no more than 100 times speedup regardless the number of processors it has.

E. (2 pts) Mutual exclusion is designed for processors to correctly perform multi-step updates to shared data structure during parallel execution.

Part 4 (10 points): This part tests your understanding of the Illinois (MESI) cache coherence protocol. Label each of the following statements as T (true) or F (false) according to the lectures and class reading.

A. (2 pts) The Illinois protocol defines four possible states for each cache line; two of the states indicate that the cache line is present in the local cache, not in any other caches.

B. (2 pts) In an Illinois protocol system, main memory supplies data only when no cache memory contains the data.

C. (2 pts) In an Illinois protocol system, when a cache line is in the “shared” state in the local cache, it must be present in at least one other cache according to the specification we gave in the lecture.

D. (2 pts) The Illinois protocol is a snooping protocol, which requires all processors to monitor all bus transactions in order to maintain correct state of their cache lines.

E. (2pts) In an Illinois protocol system, if processor A requests a cache line that is in the “modified” state in processor B’s cache, the protocol forces a write-back by processor B’s cache to the main memory.

Part 5 (10 points): This part tests your understanding of memory consistency models.

Label each of the following statements as T (true) or F (false) according to the lectures and class reading.

A. (2 pts) In a sequential consistency system, a store to a memory location cannot be done until all previous loads from the same processor have completed.

B. (2 pts) In a processor consistency system, a load from a memory location does not need to wait until all previous stores from the same processor have completed.

C. (2 pts) In a weak consistency system, synchronization instructions must be used to enforce ordering between loads and stores to different locations by the same processor.

D. (2 pts) The release consistency model improves upon the weak consistency model by using two distinct instructions for synchronization: one for acquiring locks and one for releasing locks. All previous stores from the same processor must complete before a release instruction can proceed but they do not have to complete before an acquire instruction can proceed.

E. (2 pts) In the DASH system, the implementation of the release consistency requires a local processor to wait until all invalidation requests to remote clusters are acknowledged before an acquire instruction can proceed.

Part 6 (10 points): This part tests your understanding of directory-based cache coherence protocols and the DASH case study.

Label each of the following statements as T (true) or F (false) according to the lectures and class reading.

A. (2 pts) In a directory based protocol, one must use invalidation rather than update for coherence activities.

B. (2 pts) In a directory based protocol, each processor no longer needs to monitor all memory transactions made by all processors in the system.

C. (2 pts) In the DASH directory protocol, a load miss may trigger up to three system-level transactions: local processor to home cluster, home cluster to remote cluster, remote cluster to local processor.

D. (2 pts) In the DASH directory protocol, the directory for each memory line is designed to precisely track all remote clusters whose processor(s) contain the line in their caches.

E. (2 pts) The DASH directory protocol is based on the Illinois protocol in each cluster. Therefore, a store to an “exclusive” (exclusively owned) cache line does not require the local processor to perform any system transaction.

Part 7 (10 points): This part tests your understanding of multithreaded architectures.

Label each of the following statements as T (true) or F (false) according to the lectures and class reading.

A. (2 pts) In Control Data 6600, the central processing unit is multithreaded into ten logical processors.

B. (2 pts) Multithreading in Control Data 6600 is realized by dividing up a major cycle into ten minor cycles. During every major cycle, each of the ten logical processors occupies one of the ten minor cycles by performing one step of an instruction on its dynamic information (register contents).

C. (2 pts) Multithreading in the HEP processor is accomplished by allowing each process to take up one processor cycle on an alternating basis. By putting enough time separation between instructions from the same process, the processor does not need to provide any pipeline interlocking or bypassing logic.

D. (2 pts) According to Marr, et al. “Hyper-Threading Technology Architecture and Microarchitecture,” the trace cache does not need to be modified or expanded to accommodate multithreading.

E. (2 pts) According to Marr, et al. “Hyper-Threading Technology Architecture and Microarchitecture,” the Xeon implementation allows each logical processor to make progress even though another might fill up all its allowed buffer due to a very long latency cache miss.

Part 8 (10 points): This part tests your understanding of virtual machine architectures. Label each of the following statements as T (true) or F (false) according to the lectures and class reading.

A. (2 pts) Automatic memory management with garbage collection eliminates the need for programmers to explicitly allocate and free heap data.

B. (2 pts) Virtual function calls and inheritance hierarchy allow programmers to introduce additional functionality by adding new functions without making changes to the original functions.

C. (2 pts) The “throw” and “catch” exception handling model allows an exception caused by a function deep in a function call chain to be not processed or even relayed by intermediate functions in the chain. The exception can eventually be detected and handled by a function that is higher than these intermediate functions in the call chain.

D. (2 pts) Java improves security of application execution by not allowing pointer arithmetic operations and by performing array bounds checking. This eliminates the possibility for an application to examine or change the contents of memory locations that it is not allowed to.

E. (2 pts.) Running each application on top of its own VM isolates the application from the accidental and malicious failures of other applications in the system.

Part 9 (10 points): This part tests your understanding of the special-purpose architectures. The questions are based on the lecture by Alben from NVIDIA. Label each of the following statements as T (true) or F (false) according to the lectures and class reading.

A. (2 pts) The NVIDIA GeForce 6800 achieves high compute efficiency by performing all computation in fixed-point arithmetic.

B. (2 pts) Texture filtering is performed during Vertex processing where the texture is painted on triangles.

C. (2 pts) Frame Buffers in the NVIDIA GeForce 6800 are based on SRAM technology for high-speed data access.

D. (2 pts) Pixel processors use vector processing to take advantage of the fact that pixels require similar computation.

E. (2 pts) The Z-buffer is used to compute the visibility of objects during rendering.